Introduction

This project is an analysis of fast food restaurants in the United States as of May 2019. This data, collected by Datafiniti and available on Kaggle (https://www.kaggle.com/datafiniti/fast-food-restaurants), has information on 10,000 fast food restaurants in the US, including name, address, city, and more.

Using this data I hope to better understand the fast food market in America. We see fast food all around us, but quite often never think about it overall. In order to do so, I will break my analysis down into 3 question:

What are the most popular fast food restaurants in the US?
In what US location is fast food most abundant?
Is the number of fast food restuarants related to population?

Data Import and First Look

Library imports:

library(ggplot2)
library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tm)

## Loading required package: NLP

## 
## Attaching package: 'NLP'

## The following object is masked from 'package:ggplot2':
## 
##     annotate

library(usmap)
library(knitr)

Next I import the dataset and look into the structure by using head to examine the first few observations.

ff_data <- read_csv("Datafiniti_Fast_Food_Restaurants.csv")

## Parsed with column specification:
## cols(
##   id = col_character(),
##   dateAdded = col_datetime(format = ""),
##   dateUpdated = col_datetime(format = ""),
##   address = col_character(),
##   categories = col_character(),
##   city = col_character(),
##   country = col_character(),
##   keys = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   name = col_character(),
##   postalCode = col_character(),
##   province = col_character(),
##   sourceURLs = col_character(),
##   websites = col_character()
## )

kable(head(ff_data))

id	dateAdded	dateUpdated	address	categories	city	country	keys	latitude	longitude	name	postalCode	province	sourceURLs	websites
AVwcmSyZIN2L1WUfmxyw	2015-10-19 23:47:58	2018-06-26 03:00:14	800 N Canal Blvd	American Restaurant and Fast Food Restaurant	Thibodaux	US	us/la/thibodaux/800ncanalblvd/1780593795	29.81470	-90.81474	SONIC Drive In	70301	LA	https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3/menu,https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3,http://tripadvisor.com/Restaurant_Review-g40459-d4654052-Reviews-Sonic_Drive_In-Thibodaux_Louisiana.html,https://www.yellowpages.com/thibodaux-la/mip/sonic-drive-in-468367546	https://locations.sonicdrivein.com/la/thibodaux/800-north-canal-boulevard.html,http://sonicdrivein.com,http://www.sonicdrivein.com
AVwcmSyZIN2L1WUfmxyw	2015-10-19 23:47:58	2018-06-26 03:00:14	800 N Canal Blvd	Fast Food Restaurants	Thibodaux	US	us/la/thibodaux/800ncanalblvd/1780593795	29.81470	-90.81474	SONIC Drive In	70301	LA	https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3/menu,https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3,http://tripadvisor.com/Restaurant_Review-g40459-d4654052-Reviews-Sonic_Drive_In-Thibodaux_Louisiana.html,https://www.yellowpages.com/thibodaux-la/mip/sonic-drive-in-468367546	https://locations.sonicdrivein.com/la/thibodaux/800-north-canal-boulevard.html,http://sonicdrivein.com,http://www.sonicdrivein.com
AVwcopQoByjofQCxgfVa	2016-03-29 05:06:36	2018-06-26 02:59:52	206 Wears Valley Rd	Fast Food Restaurant	Pigeon Forge	US	us/tn/pigeonforge/206wearsvalleyrd/-864103396	35.80379	-83.58055	Taco Bell	37863	TN	https://www.yellowpages.com/pigeon-forge-tn/mip/taco-bell-474241430,https://foursquare.com/v/taco-bell/4ded6885d22deb0316df557d/menu,https://foursquare.com/v/taco-bell/4ded6885d22deb0316df557d	http://www.tacobell.com,https://locations.tacobell.com/tn/pigeon-forge/206-wears-valley-road.html?utm_source=yextandutm_campaign=yextpowerlistingsandutm_medium=referralandutm_term=026432andutm_content=website
AVweXN5RByjofQCxxilK	2017-01-03 07:46:11	2018-06-26 02:59:51	3652 Parkway	Fast Food	Pigeon Forge	US	us/tn/pigeonforge/3652parkway/93075755	35.78234	-83.55141	Arby’s	37863	TN	http://www.yellowbook.com/profile/arbys_1633893026.html,https://foursquare.com/v/arbys/4bae29f8f964a520348c3be3,https://www.yellowpages.com/pigeon-forge-tn/mip/arbys-6911678,https://www.allmenus.com/tn/pigeon-forge/166343-arbys/menu/,http://tripadvisor.com/Restaurant_Review-g55270-d1123265-Reviews-Arby_s-Pigeon_Forge_Tennessee.html,http://www.citysearch.com/profile/9409306/pigeon_forge_tn/arby_s.html	http://www.arbys.com,https://locations.arbys.com/us/tn/pigeon-forge/3652-parkway.html
AWQ6MUvo3-Khe5l_j3SG	2018-06-26 02:59:43	2018-06-26 02:59:43	2118 Mt Zion Parkway	Fast Food Restaurant	Morrow	US	us/ga/morrow/2118mtzionparkway/1305117222	33.56274	-84.32114	Steak ’n Shake	30260	GA	https://foursquare.com/v/steak-n-shake/4bcf77a741b9ef3bb87df8e5	http://www.steaknshake.com/locations/23851-steak-n-shake-mt-zion-parkway-morrow
AVwc57jLkufWRAb50ROs	2015-10-23 23:59:49	2018-06-26 02:59:43	9768 Grand River Ave	Fast Food Restaurant	Detroit	US	us/mi/detroit/9768grandriverave/-791445730	42.36882	-83.13825	Wendy’s	48204	MI	https://foursquare.com/v/wendys/4bfec191e584c928932f6d25,http://tripadvisor.com/Restaurant_Review-g42139-d4455438-Reviews-Wendy_s-Detroit_Michigan.html,http://www.yellowpages.com/detroit-mi/mip/wendys-5831913	http://www.wendys.com

After taking a first look into the data, we see that there are many features. For the purpose of this analysis, we will only need a subset of these features. Therefore to keep it more organized and central to our analysis, I will trim the dataset to a smaller number of features.

ff_data <- ff_data[,c("city", "name", "province")]
kable(head(ff_data))

city	name	province
Thibodaux	SONIC Drive In	LA
Thibodaux	SONIC Drive In	LA
Pigeon Forge	Taco Bell	TN
Pigeon Forge	Arby’s	TN
Morrow	Steak ’n Shake	GA
Detroit	Wendy’s	MI

Now we have simplified the data to only 3 main features: city, name, and province (state). Our data now has 10,000 rows and 3 columns.

To do a bit of data cleaning, we turn all the names into all lower case and remove any punctuation. This is to catch some of the duplicatins of names, for example Chick-fil-a vs. Chick-Fil-A. While this will catch most of the duplicates, there may still be repeats in cases where there are different versions of a name. For example, five guys and five guys burgers and fries will not be combined. However, this does not appear to have a disruptive affect on the data analysis.

ff_data$name <- tolower(ff_data$name)
ff_data$name <- removePunctuation(ff_data$name)

After taking a look into the data and organized what we will be using from it, we can now dive into our analysis.

Data Analysis

Question 1: What are the most popular (abundant) fast food restaurants in the US?

For this analysis, we consider most popular to be those restaurants which are most abundant. Before looking at which restaurants are the most popular, let’s look at how many unique fast food restaurants this data set considers:

cat("Number of unique fast food restaurants:", length(unique(ff_data$name)))

## Number of unique fast food restaurants: 542

Next we want to see how common each of the fast food restaurants is.

ff_freqs <- ff_data %>%
  group_by(name) %>%
  summarise(
  freq <- n())
names(ff_freqs) <- c("name", "freq")
kable(head(ff_freqs))

name	freq
7eleven	19
90 miles cuban cafe	1
abruzzi pizza	1
acropolis gyro palace	1
adobe cantina salsa	1
ak buffet	1

Looking just at this brief print out of some of the frequencies, we see that many restaurants only have 1 recorded location. For our analysis we are mainly interested in the most popular restaurants. In order to focus in on those, we can order by frequency.

ff_freqs <- ff_freqs[order(-ff_freqs$freq),]
kable(head(ff_freqs))

name	freq
mcdonalds	1948
taco bell	1032
burger king	833
subway	833
arbys	666
wendys	628

When we order by the most frequent restuarants, we see names that we are very familiar with. It is not surprising to see McDonalds at the top of the list.

Using the frequencies, we can construct a barplot of the top restaurants to visualize the data more easily.

ggplot(data=ff_freqs[1:15,], aes(x=reorder(name, -freq), y=freq)) + geom_bar(stat="identity", fill="dodgerblue") + 
labs(title="Top fifteen fast food restaurants in America", x="Restaurant name", y="Number of restaurant locations") + 
theme(axis.text.x = element_text(angle=90))

By observing the names of restaurants on the graph above, there don’t seem to be any surprises with the most abundant fast food restaurants. McDonalds far exceeds its competitors, with nearly 1000 more restaurants than the 2nd most abundant company. After Wendys, which has about 600 locations, we see a drop off in the number of restaurants the other main fast food players have.

Now that we know the most abundant/popular fast food chains in the US overall, I want to take a deeper dive into the state level. To do this we first need to find which fast food restaurant has the most locations in each state.

state_tops <- ff_data %>%
  group_by(province) %>%
  count(name) %>%
  top_n(1)

## Selecting by n

state_tops <- distinct(state_tops, province, .keep_all = TRUE)
names(state_tops) <- c("state", "name", "n")
kable(head(state_tops))

state	name	n
AK	subway	4
AL	taco bell	3
AR	mcdonalds	18
AZ	mcdonalds	59
CA	mcdonalds	158
CO	taco bell	27

Before visualizing, let’s see how many unique restaurants are the most popular in any states:

cat("Number of Unique Most Popular Restaurants:", length(unique(state_tops$name)))

## Number of Unique Most Popular Restaurants: 6

Now let’s look at this information on a map. Using this visualization, we can see which states have which most popular fastfood restaurant.

plot_usmap(data = state_tops, values = "name") + 
  scale_fill_discrete(name = "name") + 
  theme(legend.position = "right") + labs(title = "Most popular fast food restaurant per state")

From the above map we visually see again that McDonalds is by far the most abundant/popular throughout the United States. In the middle of the country we see the most concentration of non-McDonalds states.

Question 2: In what US location is fast food most abundant?

To begin answering our next question, I want to look at the restaurants by state.

state_ff <- ff_data %>%
  group_by(province) %>%
  summarise(
    n()
  )
names(state_ff) <- c("state", "number")
kable(head(state_ff))

state	number
AK	16
AL	6
AR	102
AZ	330
CA	1201
CO	148

As we see here, there is a vast range of number of fast food restaurants recorded for each state. Because we only have information on 10,000 restaurants total in the US (and know there are more), this information is not complete. However we will use it as a sample to understand the broader trends. One important area of missing data is within Alabama. Only 6 restaurants are reported in Alabama, which we know not to be true.

Using the above state data, we can now create a barplot to visualize which states have the most fast food restaurants.

ggplot(data=state_ff, aes(x=reorder(state, -number), y=number)) + geom_bar(stat="identity", fill="pink") + 
labs(title="Number of fast food restaurants per state", x="State", y="Number of restaurants") + 
theme(axis.text.x = element_text(angle=90))

Looking at the barplot above, we see that California far exceeds the number of fast food restaurants in other states, with nearly 1250. The next states in terms of most fast food restaurants are Texas, Florida, Ohio, Georgia, and Illinois.

Now let’s look at this information on a map to better visualize the geographic spread.

plot_usmap(data = state_ff, values = "number") + 
  scale_fill_continuous(low = "white", high = "blue", name = "Number of fast food restaurants", label = scales::comma
  ) + theme(legend.position = "right") + labs(title = "Number of fast food restaurants cross the US")

In this map, the darker the color the more fast food restaurants there are in that state. Instantly we see California stand out, as it is the only state in the darkest category. We then notice states like Texas, Florida, and Ohio, which were shown in the barplot as well to have high numbers of fast food restaurants.

When thinking about the fact that states like California, Texas, and Florida have the most fast food restaurants according to this data, there may be some possible confounding variables to the analysis. Question 3 will take a look at a possible reason for these states to top the list.

Question 3: Is the number of fast food restaurants related to population?

One question raised in viewing the data from the last question is whether or not the number of fast food restaurants is related to the population. In order to dive into this question, I will first begin by combining population data to current data. The data used for state populations is gathered from the US Census Bureau.

Setting up the new data. With this new data we only want the state abbreviations and the corresponding populations. We then combine this population data with the rest of the data from question #2.

pop_data <- read_csv("population2018.csv")

## Parsed with column specification:
## cols(
##   State = col_character(),
##   Population = col_number()
## )

names(pop_data) <- c("state", "population")
state_ff_pops <- merge(state_ff, pop_data)
kable(head(state_ff_pops))

state	number	population
AK	16	737438
AK	16	3013825
AL	6	4887871
AZ	330	7171646
CA	1201	39557045
CO	148	5695564

Now that we have the data set up, we can dive into the analysis. To look for a pattern between number of fast food restaurants and population, we will first start with a scatterplot.

ggplot() + geom_point(data=state_ff_pops, aes(x=number, y=population), color = "orange") + labs(title="Number of fast food restaurants per state vs state population", x="Number of fast food restaurants", y="Population")

Based on this scatterplot, there appears to be a strong positive relationship between the number of fast food restaurants in a state and the state’s population.

To build upon this analysis, we can also look into number of fast food restaurants per capita in each state.

state_ff_pops$percap <- state_ff_pops$number/state_ff_pops$population
kable(head(state_ff_pops))

state	number	population	percap
AK	16	737438	2.17e-05
AK	16	3013825	5.30e-06
AL	6	4887871	1.20e-06
AZ	330	7171646	4.60e-05
CA	1201	39557045	3.04e-05
CO	148	5695564	2.60e-05

With the per capita information, we can now go back to a state map. Now we will take into consideration the size of the state.

plot_usmap(data = state_ff_pops, values = "percap") + 
  scale_fill_continuous(low = "white", high = "blue", name = "Number of fast food restaurants", label = scales::comma
  ) + theme(legend.position = "right") + labs(title = "Number of fast food restaurants per capita across the US")

In this map, the darker the color the more fast food restaurants per capita there are in that state. Whereas earlier California, Texas, and Florida instantly stood out, this map shows us that when taking population into consideration these states no longer are at the top. Now we see states such as Wyoming, Arizona, South Dakota, and North Dakota as having the most fast food restaurants.

Conclusion

It is clear, both through going about our lives and through this data, that fast food is a very popular type of establishment in the United States today. We constantly pass different chains as we drive throughout cities. Through this analysis, we were able to take a deeper dive into characteristics of fast food restaurants in modern America.

Our first question was looking at what are the most popular fast food restaurants in the US. For this question, popular was synonymous with abundant. This simplification may not reflect the true popularity and sentiment of customers, though it is fair to assume that if something weren’t very popular there wouldn’t be so many. From our analysis we learned that McDonalds is by far the most popular fast food restaurant. Other top competitors included Taco Bell, Burger King, Subway, and Arbys. We also learned that when breaking down into state-popularity, only 6 different fast food chains prevailed as the most popular in a state. Once again, McDonalds took the top. We saw the highest amount of non-McDonalds popularity in the middle states of America.

Our next question looked at the locations of fast food restaurants. For this portion of the analysis we sought to discover in what US location (state) is fast food most abundant. While the data was imperfect, we did discover some key trends. California far exceeded all other states in the number of fast food restaurants they have (~1200). Other states with very large numbers of restaurants include Texas, Florida, and Ohio. The discoveries from this portion of the analysis raised concerns that led into the next question: is the number of fast food restaurants related to population? Just based on California, Texas, and Florida alone, without doing any analysis for question three, it seems like yes.

As stated above, question three investigated whether or not the number of fast food restaurants in a particular state is related to the population of that state. The results of question two began indicating that this were true. Based on the state population data and state # of fast food restaurant data from question two, we did find a stong linear relationship between the two variables. Knowing this, we were able to adjust the findings from question two to look at which states have the most fast food restaurants per capita. Frontrunners here included Wyoming, Arizona, and South Dakota. This is a much different list than before controlling for population.

Overall, this analysis did not prove any groundbreaking discoveries, but rather helps us understand a common thing in the world around us. Fast food certainly isn’t going anywhere for now, and as we’ve seen it is incredibly popular around the country. McDonalds takes the lead, but maybe some key competitors will break through in the future.

Stats 32 Final Project: Fast Food in America